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Abstract 

The joint-sparse recovery problem aims to recover, from sets of compressed measurements, 
unknown sparse matrices with nonzero entries restricted to a subset of rows. This is an 
extension of the single-measurement-vector (SMV) problem widely studied in compressed 
sensing. We analyze the recovery properties for two types of recovery algorithms. First, we 
show that recovery using sum-of-norm minimization cannot exceed the uniform recovery rate 
of sequential SMV using l\ minimization, and that there are problems that can be solved with 
one approach but not with the other. Second, we analyze the performance of the ReMBo 
algorithm [M. Mishali and Y. Eldar, IEEE Trans. Sig. Proc, 56 (2008)] in combination with 
ii minimization, and show how recovery improves as more measurements are taken. From 
this analysis it follows that having more measurements than number of nonzero rows does not 
improve the potential theoretical recovery rate. 



1 Introduction 

A problem of central importance in compressed sensing [1, 10] is the following: given an m x n 
matrix A, and a measurement vector b = Axq, recover xq. When m < n, this problem is ill-posed, 
and it is not generally possible to uniquely recover xq without some prior information. In many 
important cases, Xq is known to be sparse, and it may be appropriate to solve 

minimize ||a;||o subject to Ax — 6, (l-l) 

to find the sparsest possible solution. (The £o-norm || • ||o of a vector counts the number of nonzero 
entries.) If xq has fewer than s/2 nonzero entries, where s is the number of nonzeros in the sparsest 
null- vector of A, then xq is the unique solution of this optimization problem [12, 19]. The main 
obstacle of this approach is that it is combinatorial [24] , and therefore impractical for all but the 
smallest problems. To overcome this, Chen et al. [6] introduced basis pursuit: 

minimize subject to Ax — b. (1-2) 

This convex relaxation, based on the ^i-norm can be solved much more efficiently; moreover. 



under certain conditions [2,11], it yields the same solution as the £o problem (1.1). 

A natural extension of the single-measurement-vector (SMV) problem just described is the 
multiple-measurement-vector (MMV) problem. Instead of a single measurement 6, we are given a 
set of r measurements 

b'-''^ ^ Axi^K /c==l,...,r, 



(k) 

in which the vectors x^ are jointly sparse — i.e., have nonzero entries at the same locations. Such 
problems arise in source localization [22] , neuromagnetic imaging [8] , and equalization of sparse- 
communication channels [7, 15]. Succinctly, the aim of the MMV problem is to recover Xq from 
observations B ~ AX^^ where B — [6^^\ fe'^^-', . . . , b'^'^\ is an m x r matrix, and the n x r matrix 
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Xq is row sparse — i.e., it has nonzero entries in only a small number of rows. The most widely 
studied approach to the MMV problem is based on solving the convex optimization problem 



minimize ||X|L„ subject to AX = B, 



where the mixed £p^q norm of X is defined as 



1 



i/p 



and X^^ is the (column) vector whose entries form the jth row of X. In particular, Cotter et al. [8] 
consider p = 2, q < 1; Tropp [28,29] analyzes p ^ 1, q = oo; Malioutov et al. [22] and Eldar and 
Mishali [14] use p = 1, q = 2; and Chen and Huo [5] study p = 1, q > 1. A different approach is 
given by Mishali and Eldar [23], who propose the ReMBo algorithm, which reduces MMV to a 
series of SMV problems. 

In this paper we study the sum-of-norms problem and the conditions for uniform recovery of 
all Xq with a fixed row support, and compare this against recovery using ii^i. We then construct 
matrices Xq that cannot be recovered using £i^i but for which £1^2 does succeed, and vice versa. 
We then illustrate the individual recovery properties of £1^1 and £1^2 with empirical results. We 
further show how recovery via £1^1 changes as the number of measurements increases, and propose 
a boosted-^i approach to improve on the £1^1 approach. This analysis provides the starting point 
for our study of the recovery properties of ReMBo, based on a geometrical interpretation of this 
algorithm. 

We begin in Section |2] by summarizing existing £o-£i equivalence results, which give conditions 
under which the solution of the £1 relaxation ( |1.2| coincides with the solution of the £q problem . 
In Section |3] we consider the £1^2 mixed-norm and sum-of-norms formulations and compare their 
performance against £11. In Sections [4] and [5] we examine two approaches that are based on 



sequential application of (1.2|. 



Notation. We assume throughout that A is a full-rank matrix in M™^", and that Xq is an s 
row-sparse matrix in M"^''. We follow the convention that all vectors are column vectors. For 
an arbitrary matrix M, its jth column is denoted by the column vector M^^; its zth row is the 
transpose of the column vector M*^. The ith entry of a vector v is denoted by Vi. We make 
exceptions for — and for xq (resp., Xq), which represents the sparse vector (resp., matrix) 
we want to recover. When there is no ambiguity we sometimes write rrii to denote M^'. When 
concatenating vectors into matrices, [a, b, c] denotes horizontal concatenation and [a; b; c] denotes 
vertical concatenation. When indexing with X, we define the vector vx '■= [wi]igx, and the m x |X| 
matrix Ax '■= [A^-']j,=x- Row or column selection takes precedence over all other operators. 



2 Existing results for ii recovery 



The conditions under which (1.2 1 gives the sparsest possible solution have been studied by applying 
a number of different techniques. By far the most popular analytical approach is based on the 
restricted isometry property, introduced by Candes and Tao [3], which gives sufficient conditions 
for equivalence. Donoho [9] obtains necessary and sufficient (NS) conditions by analyzing the 



underlying geometry of (1.2 1. Several authors [12, 13, 19] characterize the NS-conditions in terms of 
properties of the kernel of A: 

Ker(yl) ^ {x \ Ax ^ 0}. 



Fuchs [16] and Tropp [27] express sufficient conditions in terms of the solution of the dual of (1.2): 



maximize b^y subject to ||^"^?/||oo < 1- (2.1) 
y 

In this paper we are mainly concerned with the geometric and kernel conditions. We use the 
geometrical interpretation of the problems to get a better understanding, and resort to the null-space 
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properties of A to analyze recovery. To make the discussion more self-contained, we briefly recall 
some of the relevant results in the next three sections. 



2.1 The geometry of £i recovery 

The set of all points of the unit ^i-ball, {x E M" | ||a;||i < 1}, can be formed by taking convex 
combinations of ±ej, the signed columns of the identity matrix. Geometrically this is equivalent 
to taking the convex hull of these vectors, giving the cross-polytope C ~ conv{±ei, ±62, . . . , ±e„}. 
Likewise, we can look at the linear mapping x 1— > Ax for all points x G C, giving the polytope 
V = {Ax \ X G C} = AC. The faces of C can be expressed as the convex hull of subsets of 
vertices, not including pairs that are reflections with respect to the origin (such pairs are sometimes 
erroneously referred to as antipodal, which is a slightly more general concept [21]). Under linear 
transformations, each face from the cross-polytope C either maps to a face on V or vanishes into 
the interior of V. 



The solution found by (1.2 1 can be interpreted as follows. Starting with a radius of zero, we 
slowly "inflate" V until it first touches b. The radius at which this happens corresponds to the 
£i-norm of the solution x* . The vertices whose convex hull is the face touching b determine the 
location and sign of the non-zero entries of x* , while the position where b touches the face determines 



their relative weights. Donoho [9] shows that xq can be recovered from b = Axo using (1.2 1 if 
and only if the face of the (scaled) cross-polytope containing xq maps to a face on V. Two direct 
consequences are that recovery depends only on the sign pattern of xq, and that the probability of 
recovering a random s-sparse vector is equal to the ratio of the number of (s — l)-faces in V to the 
number of (s — l)-faces in C. That is, letting J-diV) denote the collection of all d-faces [21] in V, 
the probability of recovering xq using £1 is given by 

When we need to find the recoverability of vectors restricted to a support T, this probability 
becomes 

where ^i{C) ~ 2l-^l denotes the number of faces in C formed by the convex hull of {±ej}igx, and 
J-x{AC) is the number of faces on AC generated by {±A^^}jgx- 

2.2 Null-space properties and £1 recovery 

Equivalence results in terms of null-space properties generally characterize equivalence for the set 
of all vectors x with a fixed support, which is defined as 

Supp(a;) = {j I Xj ^ 0}. 

We say that x can be uniformly recovered on I C {1, . . . , n} if all x with Supp(a;) C 2 can be 
recovered. The following theorem illustrates conditions for uniform recovery via £1 on an index set; 
more general results are given by Gribonval and Nielsen [20]. 

Theorem 2.1 (Donoho and Elad [12], Gribonval and Nielsen [19]). Let A be an m x n matrix and 
X C {1, . . . , n} be a fixed index set. Then all Xq G K" with Supp{xq) C 2 can be uniquely recovered 
from b — Axq using basis pursuit ( |1.2[ ) if and only if for all z G _R'er(A) \ {0}, 

Ei^^-i<Ei^^-i- (2-3) 

That is, the £i-norm of z on T is strictly less than the £i-norm of z on the complement X'^ . 



3 



2.3 Optimality conditions for £i recovery 



Sufficient conditions for recovery can be derived from tire first-order optimality conditions necessary 
for X* and y* to be solutions of (|1.2[) and (|2.1[) respectively. The Karush-Kuhn- Tucker (KKT) 



conditions are also sufficient in this case because the problems are convex. The Lagrangian function 

b); 



for ( 1.2 1 is given by 



^x,y) ||a:||i - y'^{Ax 



the KKT conditions require that 



Ax = b and Oed.^C{x,y), (2.4) 
where dxC denotes the subdifferential of C with respect to x. The second condition reduces to 

G sgn(a;) — A'^y, 



where the signum function 



sgn(7) e 



sign(7) if 7 7^ 0, 
[—1,1] otherwise, 



is applied to each individual component of x 
Ax* = b and there exists an m- vector y such that 



It follows that a;* is a solution of (1.2 1 if and only if 
< 1 for j ^ Supp(x), and ajy = sign(a;*) for 



all j G Supp(a;). Fuchs [16] shows that x* is the unique solution of (1.2 1 when [aj]j^Supp{x) is full 



rank and, in addition. 



< 1 for all 



^ ^ J ^ Supp(x). When the columns of A are in general position 

(i.e., no fc + 1 columns of A span the same k — 1 dimensional hyperplane for fc < n) we can weaken 
this condition by noting that for such A, the solution of (1.2 1 is always unique, thus making the 



existence of a y that satisfies (2.4) for xq a necessary and sufficient condition for £i to recover Xq. 



3 Recovery using sums-of-row norms 

Our analysis of sparse recovery for the MMV problem of recovering Xq from B = AXq begins with 
an extension of Theorem |2.1| to recovery using the convex relaxation 

n 

minimize subject to AX = B; (3-1) 

note that the norm within the summation is arbitrary. Define the row support of a matrix as 

Supp„„(X) = {j I ^0}. 

With these definitions we have the following result. (A related result is given by Stojnic et al. [26].) 

Theorem 3.1. Let A be an m x n matrix, k be a positive integer, X C {1, . . . ,n} be a fixed index 
set, and let \\ ■ \\ denote any vector norm. Then all Xq G M"^'' with Supp^^^{Xq) C X can be uniquely 
recovered from B = AXq using ( |3.1[ ) if and only if for all Z with columns Z^^ G Ker{A) \ {0}, 

^||Z^-||<^||Z^-||. (3.2) 
Proof. For the "only if" part, suppose that there is a Z with columns Z^'^ G Ker{A) \ {0} such 



that (3.2) docs not hold. Now, choose X^^ = Z^^ for all j G X and with all remaining rows 
zero. Set B = AX. Next, define V ^ X - Z, and note that AV = AX - AZ = AX = B. The 
construction of V implies that J^j ll^"*^!! ^ 12j II ^"'^IL ^^'^ consequently X cannot be the unique 



solution of (3.1 1 



Conversely, let X be an arbitrary matrix with SuppjQ^(X) C X, and let B — AX. To show that 
X is the unique solution of ( |3.1[ ) it suffices to show that for any Z with columns Z^^ G Ker(A) \ {0}, 



Y^\\{x+z)n\>Y.\\^'' 
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This is equivalent to 



j^i jei jei 



Applying the reverse triangle inequahty, \\a + b\\ 
reordering exactly gives condition (3.2 1. 



I ^11 > "He'll, to the summation over j £l and 

□ 



In the special case of the sum of ^i-norms, i.e., -^i^i, summing the norms of the columns is 



equivalent to summing the norms of the rows. As a result, (3.1 1 can be written as 



minimize | 

fe=i 



X 



subject to AX^'' = B^'', k = 1, 



Because this objective is separable, the problem can be decoupled and solved as a series of 
independent basis pursuit problems, giving one X-^'^ for each column B^'' of B. The following result 
relates recovery using the sum-of-norms formulation (3.11 to li^i recovery. 



Theorem 3.2. Let A be an m x n matrix, r be a positive integer, X C {1, . . . , n} be a fixed index 
set, and || • || denote any vector norm. Then uniform recovery of all X G M"^'' with SupPj.^^{X) C X 
using sums of norms (3.11 implies uniform recovery on 2 using £i^i. 



Proof. For uniform recovery on support X to hold it follows from Theorem 3.1 that for any matrix 
Z with columns Z^'' e Ker{A) \ {0}, property (3.2 1 holds. In particular it holds for Z with Z-^'^ = z 
for all k, with z € Ker(yl) \ {0}. Note that for these matrices there exist a norm-dependent constant 
7 such that 

k".l = 7ll^^'-||. 



Since the choice of z was arbitrary, it follows from (3.2 1 that the NS-condition (2.3 1 for independent 
recovery of vectors B^'' using £i in Theorem 2.1 is satisfied. Moreover, because is equivalent to 
independent recovery, we also have uniform recovery on X using £i i. □ 



An implication of Theorem |3. 2 1 is that the use of restricted isometry conditions — or any technique, 
for that matter — to analyze uniform recovery conditions for the sum-of-norms approach necessarily 
lead to results that are no stronger than uniform £i recovery. (Recall that the £i^i and £i norms 
are equivalent). 



3.1 Recovery using 2 

In this section we take a closer look at the £12 problem 

minimize 11-^^111,2 subject to AX — B, (3-3) 

which is a special case of the sum-of-norms problem. Although Theorem [3^ establishes that uniform 
recovery via £±^2 is no better than uniform recovery via £1^1, there are many situations in which 
it recovers signals that £1,1 cannot. Indeed, it is evident from Figure [T] that the probability of 
recovering individual signals with random signs and support is much higher for £1^2- The reason for 
the degrading performance or £11 with increasing k is explained in Section |4j 

In this section we construct examples for which £1^2 works and £±^1 fails, and vice versa. This 
helps uncover some of the structure of £1.2, but at the same time implies that certain techniques 
used to study £1 can no longer be used directly. Because the examples are based on extensions of 
the results from Section [2.3| we first develop equivalent conditions here. 



5 




15 



20 



Figure 1: Recovery rates for fixed, randomly drawn 20 x 60 matrices A, averaged over 1,000 trials 
at each row-sparsity level s. The nonzero entries in the 60 x r matrix Xq are sampled i.i.d. from 
the normal distribution. The solid and dashed lines represent ii^2 and £i^i recovery, respectively. 



3.1.1 Sufficient conditions for recovery via 2 



The optimality conditions of the £1,2 problem (3.3 1 play a vital role in deriving a set of sufficient 



conditions for joint-sparse recovery. In this section we derive the dual of ( 3.3 1 and the corresponding 



necessary and sufficient optimality conditions. These allow us to derive sufficient conditions for 
recovery via £1^2- 



The Lagrangian for (3.3 1 is defined as 



C{X,Y)^\\X\\,,2-{Y,AX-B), 



(3.4) 



where (V, W) := trace(F'nV) is an inner-product defined over real matrices. The dual is then given 
by maximizing 

inf =inf {||X||i.2 - {Y,AX-B)} = {B ,Y) - sup {{A^Y, X) - ||X||i.2} (3.5) 

XX X 

over Y. (Because the primal problem has only linear constraints, there necessarily exists a dual 
solution Y* that maximizes this expression [25, Theorem 28.2].) To simplify the supremum term, 
we note that for any convex, positively homogeneous function / defined over an inner-product 
space, 

'0 ifwea/(0), 

00 otherwise. 



sup {{w,v) - /(«)} = 



To derive these conditions, note that positive homogeneity of / implies that /(O) = 0, and thus 
w G 9/(0) implies that (w, v) < f{v) for all v. Hence, the supremum is achieved with v = 0. If on 
the other hand w ^ df{Q), then there exists some v such that > /(«), and by the positive 

homogeneity of /, (w, av) — f{av) — > cxd as a — > 00. Applying this expression for the supremum 



to (3.5 1, we arrive at the necessary condition 

A^Y e a|io|' 



(3.6) 



which is required for dual feasibility. 

We now derive an expression for the subdifferential 9||X||i_2- For rows j where ||X^^||2 > 0, 
the gradient is given by V||X-'^||2 = X-'^/||X^^||2. For the remaining rows, the gradient is not 
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defined, but 9||X^^||2 coincides with the set of unit £2-norm vectors Bg — {v E 
Thus, for each j = 1, . . . , n. 



<!}• 



I2 if |1X^'-|12 > 0, 

otherwise. 



(3.7) 



Combining this expression with (3.6 1, we arrive at the dual of (3.3 1: 



maximize trace(i? Y) subject to 



\A'Y\ 



00,2 



< 1. 



(3., 



The following conditions are therefore necessary and sufficient for a primal-dual pair {X* ,Y*) to 
be optimal for (3.3 1 and its dual (3.1 



AX* 



\A'Y* 



00,2 



^ B 
< 1 



|X*||i,2 = trace(B^r*) 



(primal feasibility); 
(dual feasibility); 
(zero duality gap). 



(3.9a) 
(3.9b) 
(3.9c) 



The existence of a matrix Y* that satisfies (3.9) provides a certificate that the feasible matrix 
X* is an optimal solution of (3.3). However, it does not guarantee that X* is also the unique 



solution. The following theorem gives sufficient conditions, similar to those in Section [2. 3| that also 
guarantee uniqueness of the solution. 

Theorem 3.3. Let A be an m x n matrix, and B be an m x r matrix. Then a set of sufficient 
conditions for X to be the unique minimizer of (3.3) with Lagrange multiplier Y g M™^'' and row 
support X = Supp^^^(X), is that 



AX = B, 

{A'^YY^ = {xy-/\\{x*)n\2, 



{A'Yym,<i, 



rank(Ax) 



J el 
HI 



(3.10a) 
(3.10b) 
(3.10c) 
(3.10d) 



Proof. The first three conditions clearly imply that (X, Y) primal and dual feasible, and thus 
satisfy (3.9a) and (3.9b). Conditions (3.10bl and ( 3.10c[ ) together imply that 



trace(S^y) = ^[(A^r)^^]^^^- 



\x\ 



1,2- 



The first and last identities above follow directly from the definitions of the matrix trace and of the 
norm |j • ||i^2, respectively; the middle equality follows from the standard Cauchy inequality. Thus, 
the zero-gap requirement (3.9c) is satisfied. The conditions (3.10aH(3.10cl are therefore sufficient 
for {X,Y) to be an optimal primal-dual solution of (3.3). Because Y determines the support and 
is a Lagrange multiplier for every solution X, this support must be unique. It then follows from 
condition (3.10d) that X must be unique. □ 



3.2 Counter examples 

Using the sufficient and necessary conditions developed in the previous section we now construct 
examples of problems for which ii^2 succeeds while £1^1 fails, and vice versa. Because of its simplicity, 
we begin with the latter. 



Recovery using ^i.i where £1^2 fails. Let A be an to x n matrix with m < n and unit-norm 
columns that are not scalar multiples of each other. Take any vector x £ M" with at least m + 1 
nonzero entries. Then Xq = diag(a;), possibly with all identically zero columns removed, can be 
recovered from B = AXq using ^11, but not with ^12. To see why, note that each column in Xq 
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has only a single nonzero entry, and that, under the assumptions on A, each one-sparse vector can 
be recovered individually using £i (the points ztA^^ € M™ are all 0-faces of V) and therefore that 
Xq can be recovered using £i^i. 

On the other hand, for recovery using €12 there would need to exist a matrix Y satisfying the 
first condition of ( |3.9[ ) for all j G I = {1, . . . , n}. For this given Xq this reduces to A^Y = M, where 
AI is the identity matrix, with the same columns removed as X. But this equality is impossible 
to satisfy because rank(yl) < m < m + 1 < rank(Af). Thus, Xq cannot be the solution of the £1^2 



problem (3.3 1 



Recovery using £1^2 where £1.1 fails. For the construction of a problem where £1^2 succeeds 
and £1,1 fails, we consider two vectors, / and s, with the same support I, in such a way that 
individual £1 recovery fails for /, while it succeeds for s. In addition we assume that there exists a 
vector y that satisfies 

y'^A^^ = sign(sj) for all j G I, and ly^^'-' l < 1 for all j ^ J; 



i.e., y satisfies conditions (3.10bl and (3.10cl. Using the vectors / and s, we construct the 2-column 
matrix Xq = [(1 — 7)5, 7/], and claim that for sufficiently small 7 > 0, this gives the desired 
reconstruction problem. Clearly, for any J 0, £1^1 recovery fails because the second column can 
never be recovered, and we only need to show that £1^2 does succeed. 



For 7 = 0, the matrix Y = [y,0] satisfies conditions (3.10bl and (3.10cl and, assuming (3.10dl 
is also satisfied, Xq is the unique solution of £1^2 with B = AXq. For sufficiently small 7 > 0, the 
conditions that Y need to satisfy change slightly due to the division by ||Xg^||2 for those rows in 
Supp^Q^(X). By adding corrections to the columns of Y those new conditions can be satisfied. In 
particular, these corrections can be done by adding weighted combinations of the columns in Y , 
which are constructed in such a way that it satisfies A^Y — I, and minimizes ||^jc?||oo,oo on the 
complement 2'^ of 2. 

Note that on the above argument can also be used to show that £1^2 fails for 7 sufficiently close 
to one. Because the support and signs of X remain the same for all < 7 < 1, we can conclude the 
following: 

Corollary 3.4. Recovery using £1,2 is generally not only characterized by the row-support and the 
sign pattern of the nonzero entries in Xq, but also by the magnitude of the nonzero entries. 

A consequence of this conclusion is that the notion of faces used in the geometrical interpretation 
of £1 is not applicable to the £1^2 problem. 

3.3 Experiments 

To get an idea of just how much more £1^2 can recover in the above case where £1^1 fails, we 
generated a 20 x 60 matrix A with entries i.i.d. normally distributed, and determined a set of 
vectors Si and fi with identical support for which £1 recovery succeeds and fails, respectively. 
Using triples of vectors Sj and fj we constructed row-sparse matrices such as Xq = [si, fi, f2] or 
Xq = [si, 521/2], and attempted to recover from B = AXqW, where W = diag(wi, W2, W3) is a 



diagonal weighting matrix with nonnegative entries and unit trace, by solving (3.3). For problems 
of this size, interior-point methods are very efficient and we use SDPT3 [30] through the CVX 
interface [17,18]. We consider Xq to be recovered when the maximum absolute difference between 
Xq and the £1^2 solution X* is less than 10~^. The results of the experiment are shown in Figure [2] 
In addition to the expected regions of recovery around individual columns Si and failure around fi, 
we see that certain combinations of vectors Si still fail, while other combinations of vectors fi may 
be recoverable. By contrast, when using £1^1 to solve the problem, any combination of Si vectors 
can be recovered while no combination including an fi can be recovered. 
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f2 h h h 

^til^, M^I^L. ,/m^\., ^ y^\\ 

Si ti Si 8-2 ti I2 Ii 



\I\ = 5 



|I| = 5 



m = 7 




Figure 2: Generation of problems where £1^2 succeeds, while £11 fails. For a 20 x 60 matrix A 
and fixed support of size |X| = 5, 7, 10, we create vectors fi that cannot be recovered using £1, and 
vectors Si than can be recovered. Each triangle represents an Xq constructed from the vectors 
denoted in the corners. The location in the triangle determines the weight on each vector, ranging 
from zero to one, and summing up to one. The dark areas indicates the weights for which £1^2 
successfully recovered Xq. 



4 Boosted £1 

As described in Section |3j recovery using £1^1 is equivalent to individual £1 recovery of each column 
Xk ■— X^'' based on := B^'^, for k — 1, . . . , r: 



minimize 



subject to Ax = 



(4.1) 



Assuming that the signs of nonzero entries in the support of each Xk are drawn i.i.d. from {1, —1}, 
we can express the probability of recovering a matrix Xq with row support I using £11 in terms of 
the probability of recovering vectors on that support using £1. To see how, note that £1^1 recovers 



the original Xq if and only if each individual problem in (4.1 1 successfully recovers each Xk- For 



the above class of matrices Xq this therefore gives a recovery rate of 

Using £1^1 to recover Xq is clearly not a good idea. Note also that uniform recovery of Xq on a 
support X remains unchanged, regardless of the number of observations, r, that are given. As a 
consequence of Theorem |3.2[ this also means that the uniform-recovery properties for any sum- 
of- norms approach cannot increase with r. This clearly defeats the purpose of gathering multiple 
observations. 

In many instances where £1^1 fails, it may still recover a subset of columns Xk from the 
corresponding observations bk ■ It seems wasteful to discard this information because if we could 
recognize a single correctly recovered Xk, we would immediately know the row support X = 
SuppjQ^(Xo) — Supp(a:;fc) of Xq. Given the correct support we can recover the nonzero part X of 
Xq by solving 

minimize \\AxX - B\\f. (4.2) 

X 

In practice we obviously do not know the correct support, but when a given solution x^, of (|4.1[) 



that is sufficiently sparse, we can try to solve (4.2) for that support and verify if the residual at 



the solution is zero. If so, we construct the final X* using the non-zero part and declare success. 
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Figure 3: The boosted £i algorithm Figure 4: Theoretical (dashed) and experimental 

(solid) performance of boosted £i for three problem 
instances with different row support s. 



Otherwise we simply increment k and repeat this process until there are no more observations 
and recovery was unsuccessful. We refer to this algorithm, which is reminiscent of the ReMBo 
approach [23], as boosted ii; its sole aim is to provide a bridge to the analysis of ReMBo. The 
complete boosted £i algorithm is outlined in Figure |3] 

The recovery properties of the boosted ii approach are opposite from those of £i i: it fails only 
if all individual columns fail to be recovered using ii. Hence, given an unknown n x r matrix X 
supported on X with its sign pattern uniformly random, the boosted £i algorithm gives an expected 
recovery rate of 

P,n {A, I, r) = !-[!- Pi, {A, I)Y . (4.3) 

To experimentally verify this recovery rate, we generated a 20 x 80 matrix A with entries 
independently sampled from the normal distribution and fixed a randomly chosen support set 
for three levels of sparsity, s = 8, 9, 10. On each of these three supports we generated vectors with 



all possible sign patterns and solved (1.2 1 to see if they could be recovered or not (see Section 3.3 1 



This gives exactly the face counts required to compute the £i recovery probability in (2.2), and the 



expected boosted £i recovery rate in (4.3 1 



For the empirical success rate we take the average over 1,000 trials with random coefficient 
matrices X supported on 2^, and its nonzero entries independently drawn from the normal 
distribution. To reduce the computational time we avoid solving £i and instead compare the sign 
pattern of the current solution Xk against the information computed to determine the face counts 
(both A and remain fixed) . The theoretical and empirical recovery rates using boosted £i are 
plotted in Figure |4j 

5 Recovery using ReMBo 

The boosted £i approach can be seen as a special case of the ReMBo [23] algorithm. ReMBo 
proceeds by taking a random vector w €W~ and combining the individual observations in B into a 
single weighted observation b '■— Bw. It then solves a single measurement vector problem Ax = b 
for this b (we shall use £i throughout) and checks if the computed solution x* is sufficiently sparse. 
If not, the above steps are repeated with a different weight vector w; the algorithm stops when a 
maximum number of trials is reached. If the support I of x* is small, we form Aj = [A^^]j^x, and 



check if (4.2 1 has a solution X with zero residual. If this is the case we have the nonzero rows of 
the solution X* in X and are done. Otherwise, we simply proceed with the next w. The ReMBo 
algorithm reduces to boosted £i by limiting the number of iterations to r and choosing u) = Cj 
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given A, B. Set Iteration ^ 


while Iteration < Maxlteration do 




w <— 


Random (n, 1) 




solve (1.2| with b = Bw to get x 




I 




Supp(a;) 




if \I 


< to/2 then 






solve (|4.2| to get X 






if AjX = B then 








X* =0 








{x*y^ ^ X^^ for j el 








return solution X* 




Iteration <— Iteration + 1 


return failure 
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Figure 5: The ReMBo-^i algorithm 



Figure 6: Theoretical performance model for ReMBo 
on three problem instances with different sparsity 
levels s. 



in the ith iteration. We summarize the ReMBo-^i algorithm in Figure [Sj The formulation given 
in [23] requires a user-defined threshold on the cardinality of the support T instead of the fixed 
threshold to/2. Ideally this threshold should be half of the spark [12] of A, where 



Spark(A) := min 

zeKcr(A)\{0} 



\z\\o 



which is the number of nonzeros of the sparsest vector in the kernel of A; any vector xq with fewer 
than Spark(yl)/2 nonzeros is the unique sparsest solution of Ax = Axq = b [12]. Unfortunately, the 
spark is prohibitively expensive to compute, but under the assumption that A is in general position, 
Spark(A) = TO + 1. Note that choosing a higher value can help to recover signals with row sparsity 
exceeding to/2. However, in this case it can no longer be guaranteed to be the sparsest solution. 

To derive the performance analysis of ReMBo, we fix a support X of cardinality s, and consider 
only signals with nonzero entries on this support. Each time we multiply _B by a weight vector 
w, we in fact create a new problem with an s-sparse solution xq = Xqw corresponding with a 
right-hand side b — Bw — AXqw — Axq. As reflected in (2.2), recovery of xq using £i depends 
only on its support and sign pattern. Clearly, the more sign patterns in xq that we can generate, 
the higher the probability of recovery. Moreover, due to the elimination of previously tried sign 
patterns, the probability of recovery goes up with each new sign pattern (excluding negation of 
previous sign patterns) . The maximum number of sign patterns we can check with boosted £i is 
the number of observations r. The question thus becomes, how many different sign patterns we can 
generate by taking linear combinations of the columns in Xq? (We disregard the situation where 
elimination occurs and |Supp(Xow)| < s.) Equivalently, we can ask how many orthants in M*' (each 
one corresponding to a different sign pattern) can be properly intersected by the hyperplane given 
by the range of the s x r matrix X consisting of the nonzero rows of Xq (with proper we mean 
intersection of the interior). In Section 5.1 we derive an exact expression for the maximum number 
of proper orthant intersections in K" by a hyperplane generated by d vectors, denoted by C{n,d). 

Based on the above reasoning, a good model for the recovery rate of n x r matrices Xq with 
SupPro^(Xo) = 1 < m/2 using ReMBo is given by 



C(|I|,r)/2 

P^{A,I,r) = l- Yl 



1 - 



Ti{AC) 



Ti{C) - 2{i - 1) 



(5.1) 



The term within brackets denotes the probability of failure and the fraction represents the success 
rate, which is given by the ratio of the number of faces Txi^AC) that survived the mapping to the 
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total number of faces to consider. The total number reduces by two at each trial because we can 
exclude the face / we just tried, as well as — /. The factor of two in C(|I|,r)/2 is also due to this 
symmetrjQ 

This model would be a bound for the average performance of ReMBo if the sign patterns 
generated would be randomly sampled from the space of all sign patterns on the given support. 
However, because it is generated from the orthant intersections with a hyperplane, the actual 
pattern is highly structured. Indeed, it is possible to imagine a situation where the (s — l)-faces in 
C that perish in the mapping to AC have sign patterns that are all contained in the set generated 
by a single hyperplane. Any other set of sign patterns would then necessarily include some faces 
that survive the mapping and by trying all patterns in that set we would recover Xq. In this case. 



the average recovery over all Xq on that support could be much higher than that given by (5.1 ). 
We do not yet fully understand how the surviving faces of C are distributed. Due to the simplicial 
structure of the facets of C, we can expect the faces that perish to be partially clustered (if a 
{d — 2)-face perishes, then so will the two (d — l)-faces whose intersection gives this face), and 
partially unclustered (the faces that perish while all their sub- faces survive). Note that, regardless 
of these patterns, recovery is guaranteed in the limit whenever the number of unique sign patterns 
tried exceeds half the number of faces lost, {\Ti{C)\ — \Tx{AC)\) /2. 

Figure [b] illustrates the theoretical performance model based on C(n, d), for which we derive the 
exact expression in Section [O] In Section [53] we discuss practical limitations, and in Section [53| we 
empirically look at how the number of sign patterns generated grows with the number of normally 
distributed vectors w, and how this affects the recovery rates. To allow comparison between RcMBo 
and boosted ii , we used the same matrix A and support Ts used to generate Figure [4j 

5.1 Maximum number of orthant intersections with subspace 

Theorem 5.1. Let C{n,d) denote the maximum attainable number of orthant interiors intersected 
by a hyperplane in M" generated by d vectors. Then C(n, 1) = 2, C{n,d) = 2" for d > n. In 
general, C{n,d) is given by 

C(n, d) = C(n - 1, d - 1) + C(n - 1, d) = 2 ^ I " . ) . (5.2) 

Proof. The number of intersected orthants is exactly equal to the number of proper sign patterns 
(excluding zero values) that can be generated by linear combinations of those d vectors. When 
d = 1, there can only be two such sign patterns corresponding to positive and negative multiples 
of that vector, thus giving C{n, 1) = 2. Whenever d > n, we can choose a basis for K" and add 
additional vectors as needed, and we can reach all points, and therefore all 2" = C{n,d) sign 
patterns. 



For the general case (5.2 1, let wi , . . . , be vectors in M" such that the affine hull with the origin, 
S = aff{0, wi, . . . , Vd}, gives a hyperplane in M" that properly intersects the maximum number of 
orthants, C{n, d). Without loss of generality assume that vectors Vi, i = 1, . . . , d — 1 all have their 
nth component equal to zero. Now, let T — aff{0, Wi, . . . , Wd-i} ^ be the intersection of S 

with the (n — l)-dimensional subspace of all points X — {x £ M" | a;„ = 0}, and let Ct denote 
the number of {n — l)-orthants intersected by T. Note that T itself, as embedded in K", does not 
properly intersect any orthant. However, by adding or subtracting an arbitrarily small amount 
of Wd, we intersect 2Ct orthants; taking Vd to be the nth column of the identity matrix would 
suffice for that matter. Any other orthants that are added have either a;„ > or a;„ < 0, and their 
number does not depend on the magnitude of the nth entry of Vd, provided it remains nonzero. 
Because only the first n — 1 entries of Vd determine the maximum number of additional orthants, 
the problem reduces to In fact, we ask how many new orthants can be added to Ct taking 

the affine hull of T with v, the orthogonal projection Vd onto X . Since the maximum orthants for 
this d-dimensional subspace in M""-'^ is given by C{n — 1, d), this number is clearly bounded by 



^Henceforth we use the convention that the uniqueness of a sign pattern is invariant under negation. 
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C{n — l,d) — Ct- Adding this to 2Ct, we have 



C{n, d) < 2Ct + [C{n - 1, d) - Ct] ^ Ct + C{n - 1, d) 
< C{n - l,d - 1) + C{n - l,d) 

1=0 ^ 

The final expression follows by expanding the recurrence relations, which generates (a part of) 
Pascal's triangle, and combining this with C(l,j) — 2 for j > 1. In the above, whenever there 
are free orthants in ]R"~^, that is, when d < n, we can always choose the corresponding part of Vd 
in that orthant. As a consequence we have that no hyperplane supported by a set of vectors can 
intersect the maximum number of orthants when the range of those vectors includes some e^. 

We now show that this expression holds with equality. Let U denote an (n — (i)-hyperplane in 
M" that intersects the maximum C{n, n — d) orthants. We now claim that in the interior of each 
orthant not intersected by U there exists a vector that is orthogonal to U. If this were not the 
case then T must be aligned with some and can therefore not be optimal. The span of these 
orthogonal vectors generates a d-hyperplane V that intersects Cy — 2" — C{n, n — d) orthants, and 
it follows that 



C{n, d) > Cy = 2" - C{n, n-d) 



> 



7i—d—l / ^ N n—1 / ^\ n — d— 1 / ^ 

1=0 ^ ^ 1=0 ^ ^ 1=0 

n-d ^ ' i=0 ^ ' 



where the last inequality follows from (5.3 1. Consequently, all inequalities hold with equality. □ 

Corollary 5.2. Gwen d<n, then C{n, d) = 2" - C(n, n - d), and C{2d, d) = 2^'^'\ 

Corollary 5.3. A hyperplane Ti. in M", defined as the range ofV= [vi, V2, ■ ■ ■ , vj], intersects 
the maximum number of orthants C{n,d) whenever rank(y) = n, or when ^ range(y) for 
i = 1, . . . , n. 

5.2 Practical considerations 

In practice it is generally not feasible to generate all of the C(|X|, r)/2 unique sign patterns. This 



means that we would have to replace this term in (5.1) by the number of unique patterns actually 
tried. For a given Xq the actual probability of recovery is determined by a number of factors. First 
of all, the linear combinations of the columns of the nonzero part of X prescribe a hyperplane 
and therefore a set of possible sign patterns. With each sign pattern is associated a face in C that 
may or may not map to a face in AC. In addition, depending on the probability distribution from 
which the weight vectors w are drawn, there is a certain probability for reaching each sign pattern. 
Summing the probability of reaching those patterns that can be recovered gives the probability 
P(j4,X, Xp) of recovering with an individual random sample w. The probability of recovery after t 
trials is then of the form 

l-[l-P(A,J,Xo)]*. 

To attain a certain sign pattern e, we need to find an r-vector w such that sign(Xu') = e. For a 
positive sign on the jih position of the support we can take any vector w in the open halfspace 
{w I X^^w > 0}, and likewise for negative signs. The region of vectors w in MJ' that generates a 
desired sign pattern thus corresponds to the intersection of |I| open halfspaces. The measure of 
this intersection as a fraction of W determines the probability of sampling such a w. To formalize, 
define /C as the cone generated by the rows of — diag(e)X, and the unit Euclidean {k — l)-sphere 



13 



S'^~^ = {x E W I II a; II 2 = 1}. The intersection of halfspaces then corresponds to the interior of the 
polar cone of /C: IC° = {x E MJ' \ x^y < 0, Vy G K,}. The fraction of W taken up by IC° is given 
by the (fc — l)-content of S'^^^ D K.° to the (fc — l)-content of S^^^ [21]. This quantity coincides 
precisely with the definition of the external angle of K, at the origin. 



5.3 Experiments 

In this section we illustrate the theoretical results from Section [s] and examine some practical 
considerations that affect the performance of ReMBo. For all experiments that require the matrix 
A, we use the same 20 x 80 matrix that was used in Section |4] and likewise for the supports Is ■ To 



solve (1.2 1, we again use CVX in conjunction with SDPT3. We consider xq to be recovered from 
b — Axq — AXf)W if ||a;* — xqHoo < 10"^, where x* is the computed solution. 

The experiments that are concerned with the number of unique sign patterns generated depend 
only on the s x r matrix X representing the nonzero entries of Xq. Because an initial reordering of 
the rows does not affect the number of patterns, those experiments depend only on X, s = |X|, and 
the number of observations r; the exact indices in the support set T are irrelevant for those tests. 



5.3.1 Generation of unique sign patterns 

The practical performance of ReMBo depends on its ability to generate as many different sign 
patterns using the columns in Xq as possible. A natural question to ask then is how the number 
of such patterns grows with the number of randomly drawn samples w. Although this ultimately 
depends on the distribution used for generating the entries in w, we shall, for sake of simplicity, 
consider only samples drawn from the normal distribution. As an experiment we take a 10 x 5 
matrix X with normally-distributed entries, and over 10^ trials record how often each sign-pattern 
(or negation) was reached, and in which trial they were first encountered. The results of this 
experiment are summarized in Figure |7] From the distribution in Figure [7](b) it is clear that the 
occurrence levels of different orthants exhibits a strong bias. The most frequently visited orthant 
pairs were reached up to 7.3 x 10^ times, while others, those hard to reach using weights from the 
normal distribution, were observed only four times over all trials. The efficiency of ReMBo depends 
on the rate of encountering new sign patterns. Figure |7]jc) shows how the average rate changes over 
the number of trials. The curves in Figure [7][d) illustrate the theoretical probability of recovery 



in (5.1 1, with C(n, d)/2 replaced by the number of orthant pairs at a given iteration, and with 
face counts determined as in Section [4| for three instances with support cardinality s = 10, and 
observations r = 5. 



5.3.2 Role of X. 

Although the number of orthants that a hyperplane can intersect does not depend on the basis 
with which it was generated, this choice does greatly influence the ability to sample those orthants. 
Figure |8] shows two ways in which this can happen. In part (a) we sampled the number of unique 
sign patterns for two different 9x5 matrices X, each with columns scaled to unit ^2-norm. The 
entries of the first matrix were independently drawn from the normal distribution, while those in 
the second were generated by repeating a single column drawn likewise and adding small random 
perturbations to each entry. This caused the average angle between any pair of columns to decrease 
from 65 degrees in the random matrix to a mere 8 in the perturbed matrix, and greatly reduces 
the probability of reaching certain orthants. The same idea applies to the case where d > n, 
as shown in part (b) of the same figure. Although choosing d greater than n does not increase 
the number of orthants that can be reached, it does make reaching them easier, thus allowing 
ReMBo to work more efficiently. Hence, we can expect ReMBo to have higher recovery on average 
when the number of columns in Xq increases and when they have a lower mutual coherence 
fj.{X) = min.^j |a;fa;j|/(||a;i||2 • ||a;j||2). 
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Figure 7: Sampling the sign patterns for a 10 x 5 matrix X, with (a) number of unique sign patterns 
versus number of trials, (b) relative frequency with which each orthant is sampled, (c) average 
number of new sign patterns per iteration as a function of iterations, and (d) theoretical probability 
of recovery using ReMBo for three instances of Xq with row sparsity s — 10, and r = 5 observations. 




Figure 8: Number of unique sign patterns for (a) two 9x5 matrices X with columns scaled to 
unit ^2-norm; one with entries drawn independently from the normal distribution, and one with a 
single random column repeated and random perturbations added, and (b) 10 x r matrices with 
r= 10,12,15. 
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Figure 9: Effect of limiting the number of weight vectors w on (a) the distribution of unique orthant 
counts for 10 x fc random matrices X, solid lines give the median number and the dashed lines 
indicate the minimum and maximum values, the top solid line is the theoretical maximum; (b-c) 
the average performance of the ReMBo-^i algorithm (solid) for fixed 20 x 80 matrix A and three 
different support sizes r = 8, 9, 10, along with the average predicted performance (dashed). The 
support patterns used are the same as those used for Figure [4j 



5.3.3 Limiting the number of iterations 

The number of iterations used in the previous experiments greatly exceeds that what is practically 
feasible: we cannot afford to run ReMBo until all possible sign patterns have been tried, even if 
there was a way detect that the limit had been reached. Realistically, we should set the number 
of iterations to a fixed maximum that depends on the computational resources available, and the 
problem setting. 

In Figure |7] we show the unique orthant count as a function of iterations and the predicted 
recovery rate. When using only a limited number of iterations it is interesting to know what the 
distribution of unique orthant counts looks like. To find out, we drew 1,000 random X matrices 
for each size s x r, with s = 10 nonzero rows fixed, and the number of columns ranging from 
r = 1, . • . , 20. For each X we counted the number of unique sign patterns attained after respectively 
1,000 and 10,000 iterations. The resulting minimum, maximum, and median values are plotted 
in Figure [9](a) along with the theoretical maximum. More interestingly of course is the average 
recovery rate of ReMBo with those number of iterations. For this test we again used the 20 x 80 
matrix A with predetermined support X, and with success or failure of each sign pattern on that 
support precomputed. For each value of r = 1, ... ,20 we generated random matrices X on 2 
and ran ReMBo with the maximum number of iterations set to 1,000 and 10,000. To save on 
computing time, we compared the on-support sign pattern of each combined coefficient vector Xw 
to the known results instead of solving £i . The average recovery rate thus obtained is plotted in 
Figures [9jb)-(c), along with the average of the predicted performance using (5.11 with C{n,d)/2 
replaced by orthant counts found in the previous experiment. 

6 Conclusions 

The MMV problem is often solved by minimizing the sum-of-row norms of the unknown coefficients 
X. We show that the (local) uniform recovery properties, i.e., recovery of all Xo with a fixed row 
support 1 = SupPrQ^(Xo), cannot exceed that of ii^i, the sum of li norms. This is despite the fact 



that £i,i reduces to solving the basis pursuit problem (1.2 1 for each column separately, which does 
not take advantage of the fact that all vectors in Xq are assumed to have the same support. A 
consequence of this observation is that the use of restricted isonietry techniques to analyze (local) 
uniform recovery using sum-of-norm minimization can at best give improved bounds on £i recovery. 

Empirically, minimization with £1^2, the sum of £2 norms, clearly outperforms £1^1 on individual 
problem instances: for supports where uniform recovery fails, £1^2 recovers more cases than £1^1. 
We construct cases where £1,2 succeeds while ^1.1 fails, and vice versa. From the construction where 
only £1,2 succeeds it also follows that the relative magnitudes of the coefficients in Xq matter for 
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recovery. This is unlike £i^i recovery, where only the support and the sign patterns matter. This 
implies that the notion of faces, so useful in the analysis of £i, disappears. 

We show that the performance of ii^i outside the uniform-recovery regime degrades rapidly 
as the number of observations increases. We can turn this situation around, and increase the 
performance with the number of observations by using a boosted-£i approach. This technique 
aims to uncover the correct support based on basis pursuit solutions for individual observations. 
Boosted-£i is a special case of the ReMBo algorithm which repeatedly takes random combinations 
of the observations, allowing it to sample many more sign patterns in the coefficient space. As a 
result, the potential recovery rates of ReMBo (at least in combination with an £i solver) are a 
much higher than boosted-^i . ReMBo can be used in combination with any solver for the single 
measurement problem Ax = b, including greedy approaches and reweighted ii [4] . The recovery 
rate of greedy approaches may be lower than £i but the algorithms are generally much faster, thus 
giving ReMBo the chance to sample more random combinations. Another advantage of ReMBo, 
even more so than boosted-^i, is that it can be easily parallelized. 

Based on the geometrical interpretation of ReMBo-£i (cf. Figure |5]), we conclude that, the- 
oretically, its performance does not increase with the number of observations after this number 
reaches the number of nonzero rows. In addition we develop a simplified model for the performance 
of ReMBo-£i. To improve the model we would need to know the distribution of faces in the 
cross-polytope C that map to faces on AC, and the distribution of external angles for the cones 
generated by the signed rows of the nonzero part of Xq. 

It would be very interesting to compare the recovery performance between £12 and ReMBo-^i. 
However, we consider this beyond the scope of this paper. 

All of the numerical experiments in this paper are reproducible. The scripts used to run the 
experiments and generate the figures can be downloaded from 

http : //www . cs . ubc . ca/'^mpf / j ointsparse 
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